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The gene encoding the spike protein of the OC43 strain 
of human coronavirus (HCV-OC43) was cloned and 
sequenced. The complete nucleotide sequence revealed 
an open reading frame of 4062 nucleotides encoding a 
protein of 1353 amino acids with a predicted M., of 
150078. Structural features include 22 N-glycosylation 
sites, an N-terminal hydrophobic signal sequence of 17 
amino acids, an hydrophilic cysteine-rich sequence of 35 
amino acids near the C terminus, and a potential 


proteolytic cleavage site (RRSR) between amino acid 
residues 758 and 759, yielding S1 and S2 segments of 
84730 and 65366 M., respectively. The predicted amino 
acid sequence of the spike protein of HCV-OC43 has 
91% identity with that of the Mebus strain of bovine 
coronavirus, revealing more sequence divergence in the 
putative bulbous part (S1) than in the predicted stem 
region (S2). 


Human coronaviruses (HCV) are enveloped positive- 
stranded RNA viruses that cause respiratory infections 
and have been associated with gastrointestinal and 
neurological disorders (Jouvenne et al., 1992; McIntosh, 
1974; Macnaughton & Davies, 1981; Murray et al., 
1992; Resta et al., 1985; Stewart et al., 1992; Talbot & 
Jouvenne, 1992; Tyrrell, 1986). They are categorized into 
two major antigenic groups, represented by the prototype 
strains 229E and OC43 (Macnaughton et al., 1981; Wege 
et al., 1982). 

The HCV-0C43 virion is composed of four structural 
proteins. Three of them are transmembrane proteins: 
spike (S), membrane (M) and haemagglutinin—esterase 
(HE). The fourth protein is an internal nucleocapsid (N) 
protein, possibly associated with the internal portion of 
the M protein (Sturman ef a/., 1980). The N protein 
binds to the virion RNA, forming the nucleocapsid of 
the virion (Baric et al., 1988). Both coronavirus glyco- 
proteins (S and M) are synthesized in the endoplasmic 
reticulum on membrane-bound ribosomes (Nieman ef 
al., 1982). The integral membrane M protein interacts 
with the viral nucleocapsid and is believed to play a role 
in determining the intracellular site of virus budding. The 
S glycoprotein mediates binding of virions to the host 
cell receptor (Williams er a/., 1991; Delmas et al., 1992; 
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Yeager et al., 1992), possesses a fusogenic activity, is the 
major target for antiviral neutralizing antibodies (Spaan 
et al., 1988; Daniel & Talbot, 1990) and can also be 
recognized by T lymphocytes (K6rner et al., 1991). 
During maturation and intracellular transport, some S 
molecules are cleaved by host cell proteases probably 
located in the Golgi apparatus to yield two large subunits 
called Si and S2 (Frana e¢ al., 1985). Primary sequence 
analysis suggested that the bulbous part of the S protein 
is formed by the N-terminal half of the molecule, S1 
(Cavanagh, 1983; de Groot et al., 1987a). The C- 
terminal half of the S molecule, $2, is anchored in the 
virion envelope and is predicted to form an intrachain 
coiled-coil structure via heptad repeat patterns which 
would give it an elongated stem-like structure (de Groot 
et al., 1987a). 

HCV-0C43-infected cells contain a genomic-sized 
viral mRNA plus eight subgenomic viral mRNA species 
(Mounir & Talbot, 1993). These mRNAs are arranged in 
a 3’-coterminal nested-set structure, in which the se- 
quence of every mRNA is contained within the sequence 
of the next larger mRNA (Lai, 1990) and each mRNA 
possesses a leader sequence identical to the 5’ end of the 
genome (S. Mounir & P. J. Talbot, unpublished). 

The nucleotide and deduced amino acid sequences of 
structural and non-structural proteins as well as the 
leader sequence of HCV-OC43 have been determined 
(Kamahora et al., 1989; Zhang et al., 1992; Mounir & 
Talbot, 1992, 1993). Given the key biological importance 
of the S glycoprotein in coronavirus pathogenesis and its 
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10 20 30 40 50 60 70 80 90 100 
GGCTGCATGATGCTTAGACCATAATCTAAACATGTTTTTGATACTTTTAATTITCCTTACCAACGGCTTITIGCTGTTATAGGAGATTTAAAGTGTACTTCA 100 
*% MEL I LL J S Lb PT AFA VIG DL KC 'T S&S 23 


GATAATATTAATGATAAAGACACCGGTCCTCCTCCTATAAGTACTGATACTGTTGATGTTACTAATGGTTTGGGTACTTATTATGTTTTAGATCGTGTGT 200 
DN INODkKODT?TGPPePTI_S TDTVDVINGLGE TY Y VL DR V Y 57 


ATITAAATACTACGTTGTITCTTAATGGTTATTACCCTACTTICAGGTTCCACATATCGTAATATGGCACTGAAGGGAAGTGTACTATTGAGCAGACTATG 300 
LNT T DF ODN GY YY PT S GS TY RWNM AL KGS VLLS RL W 90 

oo] 
GTTTAAACCACCATTTCTITCIGATTTTATTAATGGTATTTT TGCTAAGGTCAAAAATACCAAGGTTATTAAAGATCGTGTAATGTATAGTGAGTICCCT 400 
F K P P F LS DF IN GIF AK VK NT KV OI OK *ODR VM Y S E F B 123 


GCTATAACTATAGGTAGTACTTTTGTAAATACATCCTATAGTGTGGTAGTACAACCACGTACAATCAATTCAACACAGGATGGTGATAATAAATTACAAG 500 

A IT IGS TFVNT S ¥Y S VVVQPRT?TdIN S f Q DGD N K L Q G157 
° ° 

GICTTTTAGAGGTCTCTGTT TGCCAGTATAATATGTGCGAGTACCCACAAACGATTTGTCATCCTAACCTGGGTAATCATCGCAAAGAACTATGGCATIT 600 

Lb EV S VC @ ¥Y N MC E ¥ P QT ICH PN iL GN HR K E L W HL 190 


GGATACAGGTGTTGTTICCTGTTTATATAAGCGTAATTTCACATATGATGTGAATGCTGATTATTTGTATTTTCATTTTTATCAAGAAGGIGGTACTTTT 700 
DT GvvsS C L ¥ K RN F T Y DVN AOD Y LY F 'H F Y QO E GGT F 223 
° 


TATGCATATTTTACAGACACTGGTGTTGTTACTAAGTTTTTGTTTAATGTTTATTTAGGCATGGCGCTITCACACTATTATGTCATGCCTCTGACTTGTA 800 
YAY F T DT GVVTK FL F NY ¥ LGM AL S HY Y VM PL T C N 257 


ATAGTAAGCTTACTTTAGAATATTGGGTTACACCTCTCACTICTAGACAATATTTACTCGCTTTCAATCAAGATGGTATTATTTTTAATGCTIGAAGATTG 900 
S K LT L EY WVTP LTS RQYLtLAFNOQOD GI I F NA EBD C 290 


TATGAGTGATTTTATGAGTGAGATTAAGTGTAAAACACAATCTATAGCGCCACCTACTGGTGT TTATGAATTAAACGGTTACACTGTTCAGCCAATCGCA 1000 
MS DF MS EIk c¢c kK f QS I AP PT GV Y¥Y EL NG ¥ TV QP TA 323 


GATGTTTACCGACGTAAACCTAATCTICCCAAT TGCAATATAGAAGCT TGGCTTAATGATAAGTCGGTGCCCTCTCCATTAAATTGGGAACGTAAGACAT 1100 
DV Y R R K PN LPNCN TE AWL NOD K S V PS PLN WE R XK TF F 357 


TT TCAAATTGTAATT TTAATATGAGCAGCCTGATGTCTTTTAT TCAGGCAGACTCATTTACTTGTAATAATATTGATGCTGCTAAGATATATGGTATGTG 1200 
S NC N F NMS S LM S F ET QAODS F TC NN ITD AA K I ¥ GM C 390 
a 


TITTTCCAGCATAACTATAGATAAGTTTGCTATACCCAATGGCAGGAAGGTTGACCTACAATTGGGTAATTTGGGCTATTTGCAGTCATTTAACTATAGA 1300 
F S 5 IT IDK F AITePNGRKVODULUQLUGN LG ¥ Le@s FN ¥ R 423 


ATTGATACTACTGCAACAAGTTIGTCAGTTGTATTATAATTTACCTGCTGCTAATGTTTCTGTTAGCAGGTTTAATCCTICTACTTGGAATAAGAGATTTG 1400 
IDT f ATS CQL Y Y NLP AA NV S VS R F NP S TWN K R F G 457 
° °o 
GTTTTATAGAAGAT TCTGTTTTTAAGCCTCGACCTGCAGGTGTTCTTACTAATCATGATGTAGTTTATGCACAACACTGTTTCAAAGCTCCTAAAAATTT 1500 
F I EOD Ss V F K PR P AGVOUdTNHODV VY AQH C F K AP K NW F 490 


CTIGTCCGTGTAAATTGAATGGTTCGTGTGTAGGTAGTGGTCCTGGTAAAAATAATGGTATAGGCACTTGICCTGCAGGTACTAATTATTTAACTIGIGAT 1600 

c P ¢ K LN GS €C VG S GPGKNN GIG Tf CP AG TN ¥ LT C D 523 
o 

AATTTGTGCACTCCTGATCCTATTACATTTACAGGTACTTATAAGTGCCCCCAAACTAAATCTTTAGT TGGCATAGGTGAGCACTGTTCGGGTCTTGCTG 1700 

N Lc TT P DPI Tf FT GTY K CP QT KS LVGitI GE He & GL A V 557 


TTIAAAAGTGATTATTGTGGAGGCAATTCTTGTACTTGCCGACCACAAGCATTTTTGGGTTGGTCTGCAGACTCTTGTTTACAAGGAGACAAGTGTAATAT 1800 
K S DY cGGNSs Cc TC RPQAFLGWS ADS C LQ@GD K ¢€ N I 590 


TITTGCTAATTTTATTTTGCATGATGTTAATAGTGGTCTTACTTGTTCTACTGATT TACAAAAAGCTAACACAGACATAATTCTTGGIGTITIGIGITAAT 1900 
F A N F IL HODVNS GLTCS TDL QQ K AN TD TITLE VE V N 623 


TATGACCTCTATGGTATTTTAGGCCAAGGCATT TT TGTTGAGGTTAATGCGACTTATTATAATAGT TGGCAGAACCTITTATATGATTCTAATGGTAATC 2000 
Y DL ¥ GtI-LGOeEGETIFVEVNA TY Y N SWOQON LL ¥ DS WN G N L 657 
oe 


TCTACGGTTTTAGAGACTACATAATAAACAGAACTTTTATGATTCGTAGT TGCTATAGCGGTCGTGTTTCTGCGGCCTTTCACGCTAACTCTICCGAACC 2100 

Y G F RD Y¥Y I3IN R®FM IRS CY S GRYVYS AA F HAN S S E PB 690 
o o 
AGCATTGCTATTTCGGAATAT TAAATGCAACTACGTTTTTAATAATAGTCT TACACGACAGCTGCAACCCATTAACTATTTTGALTAGTTATCTIGGITGT 2200 
A U Lb F RN IK CN ¥ V FP NN S LBL TFROQL OP TN Y F DS YL GE 723 
° 
GITGTCAATGCT TATAATAGTACTGCTATTICTPGTTCAAACATGTGATCTCACAGTAGGTAGTGGTTACTGTGTGGATTACTCTAAAAACAGACGAAGTC 2300 
vv NAY NS TAISVQTCDLTVGS GYCEV DY S K N RR S RIT 
° 
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GIGGAGCGATTACCACTGGT TATCGGTTTACTAATTTTGAGCCATTTACTGTTAATTCAGTAAACGATAGTTTAGAACCTGTAGGTGGTTTGTATGAAAT 2400 
GAItTfGY R F T N P EP FTV NS VN DS LE P VY G&G GLY & IT 790 
t ° 
TCAAATACCTTCAGAGTTTACTATAGGTAATATGGTGGAGTT TAT TCAAACAAGCTCTCCTAAAGTTACTATTGATTGIGCTGCATTTGICIGTGGTIGAT 2500 
Q I P S EF T IGN MVEFI or? § S P K VT IDC A AF VC G D 823 


TATGCAGCATGTAAATCACAGTTGGT IGAATATGGTAGTTTCTGTGATAACATTAATGCCATACTCACAGAAGTAAATGAACTACTTGACACTACACAGT 2600 
Y AACK SS QUuLUVEY GS FCDNIN ALT OT EV NEL LD TT T Q BE 857 


TGCAAGTAGCTAATAGTTTAATGAATGGTGTTACTCT TAGCACTAAGCTTAAAGATGGCGTTAATTTCAATGTAGACGACATCAATTTTTCCCCTGTATT 2700 
Qo v AN S LM NGvVTL Ss fT KL KDG VN FN VY DDT NF OS PV LL 890 
o 


AGGTTGTCTAGGCAGCGAATGTAGTAAAGCTTCCAGTAGATCTGCTATAGAGGATTTACTTTTTGATAAAGTAAAGTTATCTGATGTCGGTTTTIGITGAG 2800 
Gc L Gs Ec 8 KAS S R S ATED ULF DK VK LS DV GF VE 923 


GCTTATAATAAT TGTACAGGAGGTGCCGAAAT TAGGGACCICATTTGTGTGCAAAGT TATAAAGGCATCAAAGTGTTGCCTCCACTGCTCTCAGAAAATC 2900 
AY N NC T GGAETIRoODUuLuUIcwvyeas ¥Y K GI KVL PPL LS E N Q 957 
° 


AGATCAGTGGATACACTTTGGCTGCCACCTCTGCTAGTCTATT TCCTCCTTGGACAGCAGCAGCAGGTGTACCATTTTATTTAAATGTTCAGTATCGCAT 3000 
Is GY TULAA T S AS LP PPWTA A AG VY PF ¥Y LN VQ Y R IT 990 


TAATGGGCTTGGTGTCACCATGGATGTGCTAAGTCAAAATCAAAAGCT TAT TGCTAATGCATTTAACAATGCCCTTTATGCTATTCAGGAAGGGTTCGAT 3100 
N GLGvfTM DvVvis QNQK Lb taAN AF WN AL YY AT QE G F D 1023 


GCAACTAATTCTGCTTTAGT TAAAATTCAAGCTGTTGT TAATGCAAATGCTGAAGCTCTTAATAACTTATTGCAACAACTCICTAATAGATTTGGTGCTA 3200 
AT WN S ALBLvkK IQaAVVNANA EA LNNLLOQuL Ss N RF G A fT 1057 


TAAGTGCTTCTTTACAAGAAATTCTATCTAGACTTGATGCTCTTGAAGCGGAAGCTCAGATAGATAGACT TAT TAATGGICGTCTTACCGCTICTTAATGC 3300 
SAS L QE ILS RLUDAL EA EAQTIODRLUIN GRLTtA LN A 1090 


TTATGTTTCTCAACAGCTTAGTGATTCTACACTGGTAAAATT TAGTGCAGCACAAGCTATGGAGAAGGTTAATGAATGTGTCAAAAGCCAATCATCTAGG 3400 
Y VS @ QLs DS TLV K F S AAQA M EK VN ECV KS QS SR 1123 


ATAAATTTCTGTGGTAATGGTAATCATATTATATCATTAGTGCAGAATGCTCCATATGGTTTGTATITTATCCACTTTAGTTATGTCCCTACTAAGTATG 3500 
IN FCGNGNsHdIdsIs LV QNA PY GLY F IHF S ¥Y V PT K ¥ V 1157 


TCACAGCGAGGGTTAGTCCTGGTCTGTGCATTGCTGGTGATAGAGGTATAGCTCCTAAGAGTGGTIATTTTGITAATGTAAATAATACTIGGATGTACAC 3600 
T ARV S PGLeEetIAG DRGtIAP KS GY FVN V NN T WM ¥Y T 1190 
o 


TGGTAGTGGTTACTACTACCCTGAACCTATAACTGAAAATAATGTTGTTGTTATGAGTACCTGCGCTGTTAATTATACTAAAGCGCCGTATGTAATGCTG 3700 
GS G ¥ Y ¥Y P EPIT?TENNVVVM S f CAV NY T KAP Y VM LD 1223 
°o 


AACACTTCAATACCCAACCTTCCTGATTT TAAGGAAGAGT TGGATCAATGGT TTAAAAATCAAACATCAGTGGCACCAGATTTGTCACTIGATTATATAA 3800 
NT S I PN LPODF K ERBLDQwWFKNQTS VAP DLS LD ¥ TIT NW 1257 


° ° o 


ATGTTACATTCTTGGACCTACAAGTTGAAATGAATAGGTTACAGGAGGCAATAAAAGTCTTAAATCAGAGCTACATCAATCTCAAGGACATTGGTACATA 3900 
VT F L D &Q...Q...V...B...M...N..R...D...9...E...A..1..4...N..U.N...2...8...¥..0n.N..2 K D I G T Y 1290 


e 


TGAATATTATGTAAAATGGCCTTGGTATGTATGGCTTTTAATCTGCCTIGCTGGTGTAGCTATGCTIGTTTTACTATTCTTCATATGCTGITGTACAGGA 4000 
EY Y V KW PW Y Vw Lite pa Gv AM LEVY LL E FPF Tc ¢c Cc T G 1323 


TGTGGGACTAGTTGTTTTAAGAAATGTGGTGGTTGTTGTGATGATTATACTGGATACCAGGAGTTAGTAATCAAAACTTICACATGACGACTAAGTTICGTC 4100 
c GT 8S ¢c F K KC GGec¢eDDY Tf CG YY @ ELV I K TS HD OD * 1353 


TITGATTCATTGCACTGATCTCTTGTTAGATCTTITTGCAATCTAGCATTTIGTTAAAGTTCTTAAGGCCACGCCCTATTAATGGACATTTIGGAGACCTGA 4200 
MD IWR P&E 
129K =» 


Fig. 1. Complete nucleotide sequence of the S protein gene of HCV-OC43 and its deduced amino acid sequence. The intergenic 
consensus sequence is doubly underlined. Potential N-glycosylation sites (°) are indicated. The N-terminal signal sequence and C- 
terminal anchor domain are singly underlined. The putative proteolytic cleavage site is indicated by an arrow. Dashes indicate the 
leucine zipper motif. Asterisks indicate termination codons. The conserved KWPWYVW motif preceding the transmembrane domain 
is thickly underlined. 


interaction with the immune system, as well as the report the nucleotide sequence of the S protein gene of 
medical importance, both known and suspected, of HCV-OC43 and compare it with the S protein gene of 
human coronaviruses, structure—function studies of the S the closely related bovine coronavirus (BCV). 


protein are highly important. As a first step, we now The origin and cultivation of the HRT-18 cells and the 
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OC43 strain of HCV as well as the preparation, reverse 
transcription and PCR amplification of viral RNA were 
described previously (Mounir & Talbot, 1992). Poly(A)- 
containing RNA was selected with the PolyATtract 
mRNA isolation system (Promega) according to the 
manufacturer’s instructions. 

Four S gene-specific primers were designed for cDNA 
synthesis and PCR amplification of the HCV-OC43 S 
gene, based on the high degree of genomic similarity 
between HCV-0C43 and BCV (Mounir & Talbot, 1992, 
1993). The sense S1H primer represented the sequence 5’ 
GCTGCATGATGCTTAGAC 3’ (nucleotides 2 to 19, 
Fig. 1), the sense S2H primer was 5’ GCGATTACCACT- 
GGTTATCGG 3’ (nucleotides 2306 to 2326, Fig. 1) 
corresponding to the sequence downstream of the 
putative proteolytic cleavage site; the antisense SII 
primer was 5’ CCGATAACCAGTGGTAATCGC 3’ 
(nucleotides 2306 to 2326, Fig. 1) and the antisense S2I 
primer 5° GGGCGTGGCCTTAAGAAC 3’ (nucleotides 
4158 to 4175, Fig. 1). Tandem EcoRI sites were added at 
the 5’ end of each oligonucleotide for cloning purposes. 

Different purified PCR products were cloned into the 
pBluescript II SK* vector (Stratagene). Unidirectional 
deletions of the inserts were created using exonuclease III 
and mung bean nuclease (Pharmacia). Sequencing was 
performed on both strands of the PCR products by the 
dideoxyribonucleotide chain termination method (Sanger 
& Coulson, 1975), with universal, reverse or specific 
primers corresponding to various regions of the S gene, 
using T7 DNA polymerase (Pharmacia) and [**S]dATP 
(Amersham). Sequence analyses were performed as 
described previously (Mounir & Talbot, 1992). 

The complete nucleotide sequence of the HCV-OC43 S 
gene and its predicted amino acid sequence are shown in 
Fig. 1, together with some structural features. The 
sequence begins 15 nucleotides downstream of the 
termination codon of the HE gene (Zhang et al., 1992). 
A single open reading frame (nucleotides 32 to 4093) 
encodes a polypeptide of 1353 amino acids (aa), with a 
predicted M, of 150078. The sequence UCUAAAC at 
nucleotides 25 to 31 is identical to the conserved 
intergenic sequence of BCV (Abraham et al., 1990), 
murine hepatitis virus (MHV) strain AS9 (Luytjes et al., 
1987), MHV-JHM (Schmidt et a/., 1987), and almost 
identical to the ACUAAAC sequence found in trans- 
missible gastroenteritis virus (Rasschaert & Laude, 
1987), porcine respiratory coronavirus (Rasschaert et al., 
1990), and feline infectious peritonitis virus (de Groot et 
al., 19876). The deduced aa sequence of the HCV-OC43 
S protein contains 22 potential N-glycosylation sites, 13 
in Sl and nine in S2 (Fig. 1). 

The HCV-OC43 S protein shares several properties 
with S proteins of other coronavirus S proteins. The first 
initiation codon at nucleotides 32 to 35 is followed by a 


potential signal peptide with a possible cleavage site (von 
Heijne, 1984) between aa residues 17 and 18. There are 
17 hydrophobic residues near the C terminus (aa 1302 to 
1318, Fig. 1) that represent the transmembrane domain. 
A stretch of eight aa (KWPWYVWL, aa 1295 to 1302; 
Fig. 1) of unknown function is found in all coronavirus 
S proteins sequenced to date (Britton, 1991). A leucine 
zipper motif terminates 10 amino acid residues upstream 
of this conserved KWPW motif located next to the 
transmembrane domain (aa 1270 to 1284, Fig. 1). It may 
be involved in the oligomerization of the S protein 
(Britton, 1991). A cysteine-rich hydrophilic C terminus 
of 35 aa (aa 1319 to 1353; Fig. 1), which is probably the 
intravirion domain, is also found in other coronavirus S 
proteins (Abraham et al., 1990; Schmidt et al., 1987; 
Binns et al., 1985; Luytjes et al., 1987; de Groot et al., 
19876; Rasschaert & Laude, 1987). 

The basic amino acid sequence RRSR (at positions 
754 to 757, Fig. 1) is located in the hydrophilic region of 
the molecule (data not shown). This sequence resembles 
the BCV (Mebus strain) cleavage site RRSRR (Abraham 
et al., 1990), the bovine enteric coronavirus F15 strain 
RRSVR (Boireau et al., 1990) and the infectious 
bronchitis virus RRFRR (Binns et a/., 1985). Cleavage of 
the S protein would divide the molecule into an N- 
terminal segment S1 of 84730 M, and a C-terminal 
segment S2 of 65366. Assuming a mean M, of 2100 for 
addition at each N-glycosylation site (Hunter er al., 1983) 
and the utilization of all sites, the mature S protein would 
comprise an S1 moiety of 112030 and S82 of 84266, for a 
total M, of 196296. This corresponds to the observed 
sizes (Mounir & Talbot, 1992). Interestingly, most 
coronavirus S proteins, including that of the Mebus 
strain of BCV, possess two basic amino acids at the 
proteolytic cleavage site, whereas HCV-OC43 has only 
one (Abraham et al., 1990; Cavanagh et al., 1986a; 
Luytjes et al., 1987; Schmidt et al., 1987). Cleavage sites 
of other viral surface proteins all contain one or two 
basic residues (Bosch et al., 1981; Dalgarno et al., 1983; 
Garoff et al., 1980; Paterson et al., 1984; Porter et al., 
1979; Rice & Strauss, 1981; Schwartz ef al., 1983; 
Shinnick ef al., 1981). C.p.e. can be observed upon 
infection of HRT-18 cells by BCV but not with HCV- 
OC43 (data not shown). It is tempting to speculate that 
the number of basic amino acids at the cleavage site may 
be involved in an efficient viral infection. 

As shown in Fig. 2, the S protein of HCV-OC43 is 
closely similar to the corresponding protein of the Mebus 
strain of BCV, with an identity of 91%. The S proteins 
of both strains of MHV (MHV-AS59 and -JHM) show 
only 62 and 59% identity, respectively, with their HCV- 
OC43 counterpart (data not shown). 

The S protein of HCV-OC43 is composed of 1353 
residues, whereas the S protein of BCV contains 1363 
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10 20 30 40 50 60 70 80 90 100 
0C43. +MFLILLISLPTAFAVIGDLKCTSDNINDKDTGPPPISTDTVDVINGLGTYYVLDRVYLNTITLFLNGY YP TSGSTYRNMALKGSVLLSRLWFKPPFLSDFI 100 
BCV eiejeres pies eee Me eecdes ee ee TVS ooo Vise vAGS ed sail 5 5.6 wae Se, 95S ble Herrera cna See Diets sa eevee bce beatae Thess MS aletetieatent . 100 
0C43. - NGIFAKVKNTKVIKDRVMYSEFPAI TIGSTPVNTSYSVVVOPRT INSTODGDNKLOGLLEVSVCOYNMCEYPOT ICHPNLGNHRKELWHLDTGVVSCLYK 200 
BCV “Sag Deedee eaniet KG isco s dre sie ecaveistas. oss toe weetare os | ip a Tsetse Doeaceks } eres 5aKS Vig are We se sities aay LOG 
0c43- RNFTYDVNADYLYFHFYQEGGTFYAYFTDTGVVTKFLFNVYLGMALSHY Y VMP LTCNSKLTLEYWVTP LTSROYLLAFNQDGIIFNAEDCMSDFMSEIKC 300 
BCV- ie Soicdaers'eeieec.at Give Gane OFS Meal oS wifes caw eeeere eae TY eter d 5 Bee (SWAM Sales Sipe, 0 Ri ob. geteveracoe V Vi eKit dees wae 296 
0C43. »-- KTOSIAPPTGVYELNGYTVOP IADVYRRKPNLPNCNIEAWLNDKSVPSPLNWERKTF SNCNFNMSSLMSFIQADSFTCNNIDAAKIYGMCFSSITIDKFA 400 
BCV pedis eS awe ek cos becouse ews De Dieg se aie wie ee-ers aecians: bates Bi Asaese hud Woe WS Sia Aspe 6 aban Go aaoe. Saeco a marae CNS OTe . 396 
0C43  IPNGRKVDLOLGNLGYLOSFNYRIDTTATSCOLYYNLPAANVSVSRFNPSTWNKRFGF IEDSVFKPRPAGVLTNHDVVYAQHCFRAPKNFCPCKLNGS-C 499 
BOV 0° -cgcaveias antrese cate aiesonery Rig Miao Sea a sane Olbash6 2065 acalv Gao) 6b ier orExe tno te a ae Rev seT OP ec. QeV es BE sHis awieea yoke S.essasee D..L. 496 
0C43. = VGSGPG----- KNNGIGTCPAGTNYLTCDN~----- LCTPDPIT--FIGTYKCPQTKSLVGIGEHCSGLAVKSDYCGGNSCTCRPQAFLGWSADSCLOGD 586 
BCV ste N ig eEDAGY 5 6 Sis. seep cs oats FL. AROCNC . 000000 6SKS.ePecseveeYeouce Dnsieas GLaaw wee we Div CQ idee gt cNe bie osc. 596 
0c43. - KCNIFANFILHDVNSGLTCSTDLOKANTDI I LGVCVNYDLYGILGOGIFVEVNATY YNSWONLLYDSNGNLYGFRDYIINRTIFMIRSCYSGRVSAAFHAN 686 
BCV Riese aise op Dice eure’ Seca suore bo, ectenke Sieve bet TD eig late sale: Seatave cdi vel arse Bereta’ tee oe Nace LE ciekeee se ade ate eee aes 696 
0C43 SSEPALLFRNIKCNYVFNNSLTROLOP INYFDSYLGCVVNAYNSTAISVOTCDLTVGSGYCVDYSKNRRSRGAITTIGYRFINFEPFTVNSVNDSLEPVGG 786 
BCV. eS riahe ede else es WE Six ic ere Rieha as og gree eis sete Dive SSVis tise Sadie wena tease a bs TR esac Riewinveras Te oa ta aceeve Oa ete’ 196 
t 
oc43 =: LYEIQIPSEF TIGNMVEF LOTSSPKVTIDCAAFVCGD YAACKSQOLVEYGSFCDNINAILTEVNELLDTTOQLOVANSLMNGVILSTKLKDGVNFNVDDINF 886 
BCV sg. Wide severe Bee Ls w Wie eine o7s Bie wae Sess aw oie's Sage waieee Gove iaidemae saps es ts Banh Roz chysiwha sarees ee, ie ieee ee 896 
0C43.  +SPVLGCLGSECSKASSRSAIEDLLFDKVKLSDVGFVEAYNNCTGGAEIRDLICVOSYKGIKVLPPLLSENOISGYTLAATSASLFPPWIAAAGVPFYLNV 986 
BCV. 0  Sishee-wece's DIN 2 cicveieenaSeveceies dens Sle alae eieya ou, esse are sig N via aww Sebsa'b 0s Vivo o.6 Spe: Sieseete. oO. 5i8le LS. i sss 996 
0C43. —- QYRINGLGVIMDVLSONOKLIANAFNNALYAIQEGFDATNSALVKIQAVVNANAEALNNLLOOLSNRFGAISASLOEILSRLDALEAEAQIDRLINGRLT 1086 
BCV eDRbiane Diets aiga Bretsleieereres annie = a Sie a3d 2a: iae, 0 eal l6i nie) cord Spe diaye! Gia, olevae' star Se Sie nts darb-becd Sb aatereee Maleate Oeeaunoes eae LOUS 
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Fig. 2. Amino acid sequence comparison of the HCV-OC43 S protein with that of BCV (Abraham et al., 1990), by alignment for 
maximum identity. Dots indicate identical residues; hyphens represent gaps introduced into the sequence; the arrow indicates the 
putative proteolytic cleavage site. The analysis was performed with the GeneWorks 2.2.1 program (IntelliGenetics) using default 


settings. 


residues. This difference appears in the N-terminal S1 
region. The function of the additional sequence in BCV 
is not known. Sequence comparison between HCV- 
O0C43 and BCV (Fig. 2) revealed more sequence 
divergence in S] than in 2. This observation is consistent 
with the model which suggests that the $1 subunit forms 
the bulbous part of the S$ protein and S2 the stem region 
(Cavanagh, 1983; de Groot et al., 1987). Antigenic sites 
involved in virus neutralization have been identified in 
both $1 and S2 (Daniel et a/., 1993; Luytjes ez al., 1989, 
Stihler et a/., 1991; Talbot et al., 1988; Takase-Yoden er 
al., 1990; Vautherot et al., 1992; Cavanagh et al., 19865). 
The comparison of the amino acid sequences of HCV- 
OC43 and BCV S proteins indicates that these viruses 
arose from a common progenitor. 

Molecular studies of the HCV-OC43 § protein gene 
are important in the study of the interaction between the 
virus, the host cell and the immune system during 


infection. The study of the remainder of the genome of 
HCV-OC43 should provide important information on 
the replication, tropism, pathogenesis and evolution of 
this important human pathogen. Such studies are in 
progress. 
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